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ASYMPTOTIC  EFFICIENCY 
OF  THE  KOLMOGOROV  -  SMIRNOV  TEST1 

by 
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University  of  Wisconsin  at  Madison 

A  simple  derivation  of  asymptotic  efficiency  for 
tiie  Kolmogorov  -  Smirnov  statistic  is  given  and  evaluated 
for  normal  location  and  normal  scale  alternatives.  Using 
equal  samples  to  simplify  the  derivation,  the  limiting 
efficiency  is  obtained  by  letting  the  type  I  error  t  g<  to 
zero  while  the  type  II  error  goes  to  P,  0  <  p  <  1.  For 
symmetric  location  alternatives,  the  efficiency  is  the  same 
as  that  obtained  for  the  Mood  and  Brown  median  test. 

Limits  of  relative  efficiencies  for  alternatives  which 
approach  the  null  hypothesis  are  2/ir  for  normal  location 
alternatives  and  4 we)"1  for  normal  scale  alternatives. 

'/'  r 

I.  INTRODUCTION 

Let  Xj,  X2, . . . ,  Xm  be  Independent  with  cumulative  distribution 
function  F(x)  and  let  Yj,  Y^, . . . ,  Yr  be  independent  with  c.  d.  f.  G'x). 
To  test  the  hypothesis  of  equality  of  F  and  G,  the  Kolmogorov  -  Smirnov 
statistics 
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D  *  sgp  I  Fm(x)  -  Gn(x)  |  ,  and  D+  =  sjp  (Fm(x)  -  Gn<x» 

where  Fm  and  G„  are  the  sample  c.  d.  f .  s  are  often  recommended, 
m  n 

In  a  recent  article  by  Capon  [5],  bounds  for  limiting  Pitman 
efficiency  were  derived.  The  purpose  of  this  paper  is  to  extend  die 
asymptotic  comparisons  by  employing  a  different  limiting  efficiency 
as  defined  by  Bahadur  [3,  p.  87]. 

With  Pitman  efficiency,  the  limiting  ratio  of  sample  sizes  is 
derived  with  sample  sizes,  critical  values,  and  alternatives  adjusted 
so  that  both  tests  obtain  limiting  type  I  and  type  II  errors  a,  p  with 
0  <  a,  p  <  1.  For  the  exact  Bahadur  efficiency  which  we  consider, 
the  alternative  is  kept  fixed  and  critical  values  are  adjusted  so  that 
die  type  II  error  approaches  p  with  0  <  P  <  1  and  the  type  I  error  goes 
to  zero  (at  an  exponential  rate)  with  increasing  sample  size.  The 
exact  Bahadur  efficiency  appears  generally  more  informative  than  the 
Pitman  efficiency  as  it  depends  upon  the  alternative.  For  those  cases 
where  both  efficiencies  have  been  computed,  the  exact  Bahadur 
efficiency  yields  the  Pitman  value  as  a  limit  when  the  alternative 
approaches  the  null  hypothesis.  For  example,  see  Bahadur  [3]  and 
Klotz  [8], 


II.  KOLMOGOROV  -  SMIRNOV  COMPUTATIONS 

For  simplicity  we  restrict  attention  to  the  case  of  equal  samples 
m  *  n  and  the  statistic  D+.  For  alternatives  F,  G,  we  reject  the 
hypothesis  F  =  G  if  D+  >  pR.  We  first  show  that  the  critical  value 


Pn  converges  to  p  where  p  «  si  »(F(x)  -  G(x))  In  order  to  have 

PR*Pp  q  [D+  =  pn  ]  -*•  p,  with  0  <  P  <  1,  as  n 

We  show  by  contradiction  that  p  £  lim  inf  pn  =  Um  sup  Pn  =  p. 
Assume  first  that  Um  sup  pn  >  p.  Under  this  assiunption  there  exists 
a  subsequence  {  n*  }  for  which  pn,  —  Um  sup  pQ  >  c  >  p  and 

pn.  «P[D+S  pn.)  SPtP+Un.+Vn.Spn,l 

*  P[Un,+Vn'S  V-P)  SP[Un,+Vn,  J  c-p] 
which  foUows  by  writing 


-  G  *  F-G  +  F-F  +  G-Gw 
n  n  n  n 


so  that 


“P  <Fn'x»  -  Gn(x»  s  <>  +  u„  +  V 

Here  *  sip  (Fn(x)  ”  F(ac)),  =  sup(G(x)  -  Gn(x)),  and  p  is  given 

above.  Thus  we  nave  the  contradiction  that 

Um  P  .  *  lira  P[  U.  +  V  ,  ^  c  -  p]  =  1 
n'  n  n'  n  n 

Since  c  -  p  >  0  and  Un,  ^  0,  Vn,  ^0  by  the  Glevenko  -  Conte  111 
Lemma  [9,  p.  20].  We  next  show  p  i  11m  inf  p^.  Assume  the 
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converse  lim  inf  p.<  p  and  write 

n 

F-G  =  F  -  G  +  F  -  F  +  G  -  G 
n  n  n  n 


so  that 


P 


+  W  +  Zn 
n  n 


with 


wn  s  sgP  (F(x)  "  FnOO),  zn  E  sup  (GQ(x)  -  G(x)). 

For  a  subsequence  {  n"  }  for  which  pn„  lim  inf  pR  <  c  <  p  (which 
exists  by  our  assumption)  we  have 

pn“  =  P(D+S  Pn„]  S  P[  P  -  CWn„ +Zn„)  £  pn„] 

'  PtWn”+2n"  S  p  "  pn"  ^  S  PtVV+Zn>i  '  ‘ 

The  contradiction  follows  from 

lim  tt  „  =  lim  P[  W  „  +  Z  „  =  p  ~  c  >  0 ]  *  0 
^  n  n  «  n 

P 

which  is  a  consequence  of  Wn„,  ZrM  -  0  using  die  Glevenko  - 
Cantelll  Lemma  again. 

Next  it  is  known  (see  for  example  Hodges  [7])  that  the  principle 
of  reflection  gives  the  null  distribution  for  equal  samples 

or  a 
n 


P(  D+>  Pn]  «  (Zn  )/<2R). 
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Using  Stirling 8  approximation  in  the  combinatorlals  and  considering 
alternatives  F,G  for  which  0  <  p  <  1  we  obtain  using  lim  Pn  =  P 

Urn  in  or  =  (l-p)ln(l-p)  +  (1+p)  ln(l+p).  (2.1) 

n-*«  n 

U  F  and  G  have  symmetric  densities  with  G(x)  =  F(x- A),  then 
p  36  2  F(A/2)  -  1.  For  normal  location  alternatives  F(x)  =  *(x), 

G(x)  =  *(x  -  A)  die  expression  (2. 1)  reduces  to 

2  ®(V 2)  In  2  #(A/2)  +  2  *(-A/2)  In  2  *(-A/2)  (2.  2) 

Similarly  for  normal  scale  alternatives  F(x)  =  *(xA),  G(x)  =  <&(xA) 
if  we  denote  0  =  t/<t  with  0  >  1  we  have 


P  a 


(2.3) 


« 


% 


ID.  PARAMETRIC  COMPUTATIONS 

For  the  case  of  normal  shift  alternatives,  the  appropriate  para¬ 
metric  test  for  comparison  with  die  Kolmogorov  -  Smirnov  test  is  the 
two  sample  t  test.  With  equal  samples,  we  reject  if 


t  =  jfn"  (y  -  x)/S  >  C 


n  n 

S2  =  [  2  (x  -  x)z  +  2  (y,  -  y)*  ]  / (2n  -  2) 
i=l  1  i=l  J 


where 


In  order  for  the  type  II  error  Pn  to  converge  to  p  with  0  <  P  <  1 

It  it  sufficient  that  the  critical  value  satisfy  C  =  fn~  A  +  W  for 

n  »2 

alternatives  F(x)  =  $(x  -  A).  This  Is  shown  by  writing 

Pn  •  Pt^"(y-x)/S  <Cn]  *  P[^’(5-5)<(^’a+W)S] 

«  njfl y-x-A)  <  ^f*A(S-l)+WS] 

«  /0®  *(  |  i(i-l)+Ws)dFs(s).  (3.1) 


If  we  denote  the  random  variable  U  *  2  */n  (S  - 1)  then  we  know 

n 

Un  has  a  limiting  normal  distribution.  Changing  variables  and  using 
the  Helly  Bray  theorem  [9,  p.  182]  the  expression  (3.1)  becomes 


P-  *  /"  *1^  +W(l+~))dFu  <u)  -  P 
n  ic0  W8  2*/n  un 


where 


P  *  +  W)d*(u)  and  0<P<1. 

-°o  ^8 


We  next  show 


Um  logo  »  log (1  +  (4)2 ). 

n-*<»  n  n  c 


(3.2) 


St  A  + 


ft 


W  we  have  under  die  null 


With  the  critical  value  ■ 

n 

hypothesis 
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®n  ‘  pl*2n-2>Cn5 


. — SHig=lt£i —  r  _ dt  „  1(/, 

M(2n— 2)»  r«2n-3)/2)  Cn  d+z^j)'  '' 


2n 

2n 


=2  ,1+3L.,-(2n-3)/Z  X 

-3  11  *  2n-2  '  c 


(3.3) 


r*  z  i2n-3. 

- L2-n~2.l _  n+ — Q_  \  2  -L- n  +  p 

(2n-3K2n-5)  u  +  2n-2  }  C^1  +  K2» 


where 


Id  J  <  - (2h"‘?) _ 

1  K2  !  “  (2n-3)(2n-5) 


The  expression  (3.  3)  Is  obtained  by  using  two  i-srms  In  the  mills 

ratio  expansion  for  the  t-distribution  derived  by  PLMcham  and  Wilk 

[11).  Thus  the  expression  <3. 2)  follows  by  substituting  ^  A  +  W 

for  CR  In  (3.  3)  and  taking  the  limit. 

For  normal  scale  alternatives  the  parametric  test  used  for 

comparison  Is  the  F  test  for  variances  based  upon  the  statistic 

8  */S  *.  For  die  comparisons  under  normality  one  might  suppose 
y  * 

that  a  better  test  could  be  found  which  takes  advantage  of  die  equal¬ 
ity  of  the  means.  Equality  of  means  Is  Imposed  for  scale  alternatives 
so  that  F  =  G  as  required  for  the  Kolmogorov  -  Smirnov  null  distri¬ 
bution  when  vst.  However,  even  if  the  means  were  known  there 
would  be  a  gain  of  at  most  one  degree  of  freedom  for  die  optimal 
statistic  In  the  numerator  and  denominator  and  the  asymptotic  results 
would  be  the  same. 


u 
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For  tiie  F  statistic,  if  we  denote  the  critical  value  by  dn  we 
must  have  d  6*  as  n  *♦  °«  in  order  that  Pn  P  with  0  <  p  <  1* 
We  have 

Pn  =  P[sy*/sx*s  d„]  *  5  V9‘J 

ax  ' 

•  plFn-l,n-l  5  V9*) 

P  - 

a  P[  __,n-i.n-l  ~  n-3 

QEI L 

J(n-3Mn-5) 

Using  the  normal  approximation,  pR  -*  p  provided 

Ve*  -  <n-U/(n-3)  _  ,  .  «-l(p)> 

JlBiS)- 

(n-3)(n-5) 

so  that  d^  —  8*.  We  now  show  that  for  fixed  0  and  the  above 
n 

condition  we  have 

lim  —jj»  log  «n  *  log  {(1  +  8*  J/28  }.  (3*4) 


d  /0*  -  (n-l]/(n-3) 

2  -*■— . . .  J. 

osn 

V  (n-3)<n-5) 


Transforming  the  F  distribution  to  the  incomplete  beta,  (See  for 
example  [1,  p.  946])  we  have  under  the  null  hypothesis 

°n  *  PI  V/8x,>dn’  *  7^  ^[uH-u)]'^-1  du 
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where 


n  (n-D+tn-lJc^  l+dn  1+6 


«  TTj-  '  TToI  <  1/2  for  e  >  1. 


Since  u(l-u)  is  an  Increasing  function  on  the  interval  (0,  1/2)  we  have 


rin-1 


gMyi-Xnfl'V-1  />» 


n  r»  (“) 


(3.5) 


Also  0  <  1  -  2u  <  1  for  0  <  u  <  1/2  so  that 


n  iJ?$-  /Xn[u(l-u)fzi’1[l-2u]dU 


r*(a=i)  o 


J.fei)  [x  (1  irz1^- 
r*  (fill)  1  n11  VJ  n-l  • 


(3.6) 


Using  Stirling's  approximation  to  the  gamma  functions  in  (3. 5)  and 
(3*6)  and  xn  -*  1/(1 +  9*)  we  derive  (3.4), 


IV.  RELATIVE  EFFICIENCIES 


According  to  Section  n  (2. 1),  for  a  fixed  alternative  and  critical 
values  adjusted  so  that  the  type  n  error  Pn‘",P  (0  <  p  <  1)  we  have 
the  type  I  error  for  the  Kolmogorov  -  Smirnov  test  going  to  zero  at  an 
exponential  rate  with  increasing  sample  size 


*n  *  e 


-nek[l  +  0(1)] 


where  is  the  number  given  on  the  r.  h.  s,  of  (2. 1),  Similarly 
for  the  parametric  tests  based  upon  samples  of  size  n*  we  have 

*  6-n*e*{l+o(l)) 
n* 

where  the^sxponents  e*  are  given  by  (3. 2)  and  (3.4)  for  the  t 
and  F  tests.  Adjusting  sample  sizes  so  as  to  equate  errors  we  have 


n*e*(l+o(l})  =  nek  (l  +  o(l)). 

Thus  lim  n*/n  =  e, /e*  is  a  limiting  efficiency, 

n-oo 

For  normal  location  alternatives  the  efficiency  relative  to  die 
t-test  is  given  by  the  ratio  of  (2. 2)  and  (3. 2) 


eM  <A>  " 


2  *{A/2)ln2  *(A/2)  +  2  B(-A/2)ln2  *(-A/2) 
ln(l+(A/Z)*) 


(4.1) 


The  expression  4. 1  is  the  same  as  that  obtained  for  the  two  sample 
Mood  and  Brown  median  test  relative  to  tht.  two  sample  t  for  equal 
samples  and  is  also  the  same  as  that  given  by  Bahadur  [3,  p.  88] 
for  the  sign  test  relative  to  die  one  sample  t  (with  8  replaced  by 
A/2).  The  limit  of  ek  t(A)  as  A-*  0  is  2/w  which  is  the  lower 
bound  derived  by  Capon  for  the  Pitman  efficiency.  It  is  thus  con- 
lectured  that  die  Pitman  limit  is  2 A  »  637. 

For  normal  scale  alternatives  die  efficiency  relative  to  the  F 


test  is  given  by 


s 
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*kfF<e> 


H-^Plln(l-  p)  4  (l  +  p)ln(Up) 
In  { (1 4  O2  )/20  } 


(4.2) 


where  0  *=  t/<t  and 


—  1  • 

The  limit  of  e^  p  (0)  as  0  *♦  1  is  (tre)  =  .117  which  Is  the  same 
number  obtained  by  Capon  as  a  lower  bound  for  the  Pitman  efficiency 
and  by  Bahadur  [4]  using  an  approximate  definition  of  efficiency  for 
the  one  sample  Kolmogorov  test.  It  is  similarly  conjectured  that 
this  Is  also  the  Pitman  efficiency  valu9.  Tables  I  and  U  give  values 
for  (4.1)  and  (4.  2). 

Because  of  no  convenient  closed  farm  expression  for  the 
Kolmogorov  -  Smirnov  null  distribution  with  unequal  samples,  the 
simple  methods  of  this  paper  do  not  appear  to  extend  to  cover  this 
case  and  more  complicated  methods  such  as  studied  by  Hoadley  [6] 
and  Abrahamson  [2]  must  be  used.  If  the  one  sided  tests  are  replaced 
by  the  two  sided  tests  the  expressions  (4. 1)  and  (4. 2)  remain  the  same. 

The  small  sample  Interpolated  efficiency  values  of  Milton 
[10,  p.  n-32]  for  location  seem  to  indicate  that  the  limiting  effici¬ 
ency  is  approached  by  a  decreasing  sequence.  The  efficiencies 
given  there  for  equal  samples  of  size  7  are  in  die  neighborhood  of 
75%. 
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